1 research outputs found
Towards Advanced Monitoring for Scientific Workflows
Scientific workflows consist of thousands of highly parallelized tasks
executed in a distributed environment involving many components. Automatic
tracing and investigation of the components' and tasks' performance metrics,
traces, and behavior are necessary to support the end user with a level of
abstraction since the large amount of data cannot be analyzed manually. The
execution and monitoring of scientific workflows involves many components, the
cluster infrastructure, its resource manager, the workflow, and the workflow
tasks. All components in such an execution environment access different
monitoring metrics and provide metrics on different abstraction levels. The
combination and analysis of observed metrics from different components and
their interdependencies are still widely unregarded.
We specify four different monitoring layers that can serve as an
architectural blueprint for the monitoring responsibilities and the
interactions of components in the scientific workflow execution context. We
describe the different monitoring metrics subject to the four layers and how
the layers interact. Finally, we examine five state-of-the-art scientific
workflow management systems (SWMS) in order to assess which steps are needed to
enable our four-layer-based approach.Comment: Paper accepted in 2022 IEEE International Conference on Big Data
Workshop SCDM 202